Search CORE

22 research outputs found

An LLVM Instrumentation Plug-in for Score-P

Author: Brendel Ronny
Döbel Sebastian
Herold Christian
Tschüter Ronny
Weber Matthias
Wesarg Bert
Ziegenbalg Johannes
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2017
Field of study

Reducing application runtime, scaling parallel applications to higher numbers of processes/threads, and porting applications to new hardware architectures are tasks necessary in the software development process. Therefore, developers have to investigate and understand application runtime behavior. Tools such as monitoring infrastructures that capture performance relevant data during application execution assist in this task. The measured data forms the basis for identifying bottlenecks and optimizing the code. Monitoring infrastructures need mechanisms to record application activities in order to conduct measurements. Automatic instrumentation of the source code is the preferred method in most application scenarios. We introduce a plug-in for the LLVM infrastructure that enables automatic source code instrumentation at compile-time. In contrast to available instrumentation mechanisms in LLVM/Clang, our plug-in can selectively include/exclude individual application functions. This enables developers to fine-tune the measurement to the required level of detail while avoiding large runtime overheads due to excessive instrumentation.Comment: 8 page

arXiv.org e-Print Archive

Crossref

Analysis of Node Failures in High Performance Computers Based on System Logs

Author: Ciorba Florina M.
Ghiasvand Siavash
Nagel Wolfgang E.
Tschüter Ronny
Publication venue
Publication date: 01/01/2015
Field of study

The growth in size and complexity of HPC systems leads to a rapid increase of their failure rates. In the near future, it is expected that the mean time between failures of HPC systems becomes too short and that current failure recovery mechanisms will no longer be able to recover the systems from failures. Early failure detection is, thus, essential to prevent their destructive effects. Based on measurements of a production system at TU Dresden over an 8-month time period, we study the correlation of node failures in time and space. We infer possible types of correlations and show that in many cases the observed node failures are directly correlated. The significance of such a study is achieving a clearer understanding of correlations between observed node failures and enabling failure detection as early as possible. The results aimed to help system administrators minimize (or prevent) the destructive effects of failures

edoc

Lessons learned from spatial and temporal correlation of node failures in high performance computers

Author: Ciorba Florina M.
Ghiasvand Siavash
Nagel Wolfgang E.
Tschüter Ronny
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this paper we study the correlation of node failures in time and space. Our study is based on measurements of a production high performance computer over an 8-month time period. We draw possible types of correlations between node failures and show that, in many cases, there are direct correlations between observed node failures. The significance of such a study is twofold: achieving a clearer understanding of correlations between node failures and enabling failure detection as early as possible. The results of this study are aimed at helping the system administrators minimize (or even prevent) the destructive effects of correlated node failures

Crossref

edoc

Performance analysis & optimization of DLR High-Performance Computing codes

Author: Gericke Jana
Huismann Immo
Tschüter Ronny
Wagner Michael
Publication venue
Publication date: 01/01/2022
Field of study

The importance of High Performance Computing (HPC) in the field of aircraft design is significantly growing. However, capabilities of current HPC systems are inadequate for scale-resolving simulations of in-flight aircraft. Consequently, coordinated advances in algorithms, hardware, and software are needed. For instance, with the end of Moore’s law, processors evolve towards an ever-increasing core count which raises requirements on the scalability of scientific software. Therefore, this submission provides an in-depth scalability analysis of the DLR flow solvers CODA and Musubi. Performance profiles guide the identification of critical aspects of both codes. The presented analysis reveals potentially sub-optimal communication patterns, proposes possible solutions, and shows first improvements of the codes

Institute of Transport Research:Publications

HyperCODA - Towards high-performing time-resolving flow simulations

Author: Fechter Stefan
Huismann Immo
Tschüter Ronny
Wendler Johannes
Publication venue: Springer Nature
Publication date: 01/11/2022
Field of study

The present work focuses on the performance analysis of the DG-SEM implementation of the CFD solver CODA. The turbulent Taylor-Green vortex is employed as a simple testcase for scaling behavior, while for a more detailed node-level performance analysis more granular kernel benchmarks are used. Bottlenecks in the implementation are highlighted and possible solutions proposed

Institute of Transport Research:Publications

Extending the Functionality of Score-P through Plugins: Interfaces and Use Cases

Author: Hackenberg Daniel
Ilsche Thomas
Nagel Wolfgang E.
Schuchart Joseph
Schöne Robert
Tschüter Ronny
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/10/2017
Field of study

Performance measurement and runtime tuning tools are both vital in the HPC software ecosystem and use similar techniques: the analyzed application is interrupted at specific events and information on the current system state is gathered to be either recorded or used for tuning. One of the established performance measurement tools is Score-P. It supports numerous HPC platforms and parallel programming paradigms. To extend Score-P with support for different back-ends, create a common framework for measurement and tuning of HPC applications, and to enable the re-use of common software components such as implemented instrumentation techniques, this paper makes the following contributions: (I) We describe the Score-P metric plugin interface, which enables programmers to augment the event stream with metric data from supplementary data sources that are otherwise not accessible for Score-P. (II) We introduce the flexible Score-P substrate plugin interface that can be used for custom processing of the event stream according to the specific requirements of either measurement, analysis, or runtime tuning tasks. (III) We provide examples for both interfaces that extend Score-P’s functionality for monitoring and tuning purposes

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Holistic Performance Analysis of Multi-layer I/O in Parallel Scientific Applications

Author: Tschüter Ronny
Publication venue
Publication date: 18/02/2021
Field of study

Efficient usage of file systems poses a major challenge for highly scalable parallel applications. The performance of even the most sophisticated I/O subsystems lags behind the compute capabilities of current processors. To improve the utilization of I/O subsystems, several libraries, such as HDF5, facilitate the implementation of parallel I/O operations. These libraries abstract from low-level I/O interfaces (for instance, POSIX I/O) and may internally interact with additional I/O libraries. While improving usability, I/O libraries also add complexity and impede the analysis and optimization of application I/O performance. This thesis proposes a methodology to investigate application I/O behavior in detail. In contrast to existing approaches, this methodology captures I/O activities on multiple layers of the I/O software stack, correlates these activities across all layers explicitly, and identifies interactions between multiple layers of the I/O software stack. This allows users to identify inefficiencies at individual layers of the I/O software stack as well as to detect possible conflicts in the interplay between these layers. Therefor, a monitoring infrastructure observes an application and records information about I/O activities of the application during its execution. This work describes options to monitor applications and generate event logs reflecting their behavior. Additionally, it introduces concepts to store information about I/O activities in event logs that preserve hierarchical relations between I/O operations across all layers of the I/O software stack. In combination with the introduced methodology for multi-layer I/O performance analysis, this work provides the foundation for application I/O tuning by exposing patterns in the usage of I/O routines. This contribution includes the definition of I/O access patterns observable in the event logs of parallel scientific applications. These access patterns originate either directly from the application or from utilized I/O libraries. The introduced patterns reflect inefficiencies in the usage of I/O routines or reveal optimization strategies for I/O accesses. Software developers can use these patterns as a guideline for performance analysis to investigate the I/O behavior of their applications and verify the effectiveness of internal optimizations applied by high-level I/O libraries. After focusing on the analysis of individual applications, this work widens the scope to investigations of coordinated sequences of applications by introducing a top-down approach for performance analysis of entire scientific workflows. The approach provides summarized performance metrics covering different workflow perspectives, from general overview to individual jobs and their job steps. These summaries allow users to identify inefficiencies and determine the responsible job steps. In addition, the approach utilizes the methodology for performance analysis of applications using multi-layer I/O to record detailed performance data about job steps, enabling a fine-grained analysis of the associated execution to exactly pinpoint performance issues. The introduced top-down performance analysis methodology presents a powerful tool for comprehensive performance analysis of complex workflows. On top of their theoretical formulation, this thesis provides implementations of all proposed methodologies. For this purpose, an established performance monitoring infrastructure is enhanced by features to record I/O activities. These contributions complement existing functionality and provide a holistic performance analysis for parallel scientific applications covering computation, communication, and I/O operations. Evaluations with synthetic case studies, benchmarks, and real-world applications demonstrate the effectiveness of the proposed methodologies. The results of this work are distributed as open-source software. For instance, the measurement infrastructure including improvements introduced in this thesis is available for download and used in computing centers world-wide. Furthermore, research projects already employ the outcomes of this work

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Holistic Performance Analysis of Multi-layer I/O in Parallel Scientific Applications

Author: Tschüter Ronny
Publication venue
Publication date: 18/02/2021
Field of study

Qucosa

Holistic Performance Analysis of Multi-layer I/O in Parallel Scientific Applications

Author: Tschüter Ronny
Publication venue
Publication date: 18/02/2021
Field of study

HSSS - Hochschulschriftenserver der SLUB